Lesson 3 Introduction: Tackling Non-Linear Classification

We are moving beyond the limitations of linear models, which struggle to classify data that is not separable by a straight line. Today, we apply the PyTorch workflow to build a Deep Neural Network (DNN) capable of learning complex, non-linear decision boundaries essential for real-world classification tasks.

1. Visualizing Non-Linear Data Necessity

Our first step is to create a challenging synthetic dataset, such as the two-moons distribution, to visually demonstrate why simple linear models fail. This setup forces us to use deep architectures to approximate the necessary intricate curve separating the classes.

Data Properties

Data Structure: Synthetic data features (e.g., $1000 \times 2$ for $1000$ samples with 2 features).
Output Type: A single probability value, often torch.float32, representing class membership.
Goal: To create a curved decision boundary through layered computation.

The Power of Non-Linear Activations

The core principle of DNNs is the introduction of non-linearity in hidden layers via functions like ReLU. Without these, stacking layers would simply result in one large linear model, regardless of depth.

TERMINAL bash — classification-env

> Ready. Click "Run" to execute.

TENSOR INSPECTOR Live

Run code to inspect active tensors

Question 1

What is the primary purpose of the ReLU activation function in a hidden layer?

Introduce non-linearity so deep architectures can model curves

Speed up matrix multiplication

Ensure the output remains between 0 and 1

Normalize the layer output to a mean of zero

Question 2

Which activation function is required in the output layer for a binary classification task?

Sigmoid

Softmax

ReLU

Question 3

Which loss function corresponds directly to a binary classification problem using a Sigmoid output?

Binary Cross Entropy Loss (BCE)

Mean Squared Error (MSE)

Cross Entropy Loss

Challenge: Designing the Core Architecture

Integrating architectural components for non-linear learning.

You must build a nn.Module for the two-moons task. Input features: 2. Output classes: 1 (probability).

Step 1

Describe the flow of computation for a single hidden layer in this DNN.

Solution:
Input $\to$ Linear Layer (Weight Matrix) $\to$ ReLU Activation $\to$ Output to Next Layer.

Step 2

What must the final layer size be if the input shape is $(N, 2)$ and we use BCE loss?

Solution:
The output layer must have size $(N, 1)$ to produce a single probability score per sample, matching the label shape.